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Article history: Psychological problems in college students like depression, pessimism, 
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Keywords: becomes a difficult task for the counselor to keep track of the significant 
changes that occur in students as a result of depression. But advances in the 
Image-Processing field have led to the development of effective systems, 
which prove capable of detecting emotions from facial images, in a much 
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Facial featur ue simpler way. Thus, we need an automated system that captures facial images 
Feature extraction of students and analyze them, for effective detection of depression. In the 
Image processing proposed system, an attempt is being made to make use of the Image 


processing techniques, to study the frontal face features of college students 
and predict depression. This system will be trained with facial features of 
positive and negative facial emotions. To predict depression, a video of the 
student is captured, from which the face of the student is extracted. Then 
using Gabor filters, the facial features are extracted. Classification of these 
facial features is done using SVM classifier. The level of depression is 
identified by calculating the amount of negative emotions present in the 
entire video. Based on the level of depression, notification is send to the class 
advisor, department counselor or university counselor, indicating the 
student’s disturbed mental state. The present system works with an accuracy 
of 64.38%. The paper concludes with the description of an extended 
architecture using other inputs like academic scores, social content, peer 
opinions and hostel activities to build a hybrid system for depression 
detection as future work. 
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1, INTRODUCTION 

In college students, depression is the result of the social change due to emergence of the internet, 
smart phones and different social media sites. Majority of students tend to conceal their psychological 
problems due to the social stigmas related to depression and also due to peer pressure. Some students remain 
totally unaware of their psychological problems and thus remain deprived of any help that may prove vital to 
their mental health. It becomes a difficult task for the counselor to keep track of the significant changes that 
occur in students as a result of depression in a large number of students. Thus we need and automated system 
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that captures images of students and analyze them for effective depression detection. Facial expressions are 
the most important form of non-verbal communications to express a persons’ emotional or mental state. A 
large number of studies are currently undergoing on ‘Facial feature analyses’ for emotion recognition from 
images which effectively help in prediction of mental health condition of human beings. This study proposes 
an automated system that detects depression levels in students by analyzing frontal face images of college 
students. To predict depression, a video of the student is captured, from which the face of the student is 
extracted. Then using Gabor filters, the facial features are extracted. Classification of these facial features is 
done using SVM classifier. The level of depression is identified by calculating the amount of negative 
emotions present in the entire video. 

A comparison of Manual FACS coding and Automated FACS coding for finding out Facial 
Expressions of depressed, showed high similarity in results of both the methods [1]. Highly depressed 
patients were found to exhibit low presence of smile (AU12) or sadness (AU 15). They showed the high 
presence of contempt (AU14) and disgust (AU10) along with smile. Figure 1 shows action units found to be 
present in depression videos [1]— (a) AU 10 — Disgust, (b) AU 12 — Happy, (c) AU 14 — Contempt, (d) AU 
15 — Sad. 





(a) AU 10 (b) AU 12 (c) AU 14 (d) AU 15 


Figure 1. Action Units found to be present in depression videos [1]— (a) AU 10 — Disgust, (b) AU 12 — 
Happy, (c) AU 14 — Contempt, (d) AU 15 — Sad 


The results pointed out that the most accurate action unit for depression detection was AU14 (action 
unit related to contempt). In [2], the identification of depression was done by analysing facial landmark 
points. The distances between them were found out using euclidean and city block distance methods. Here 
both video and audio features are extracted and then fused together and then classified. In [3] a cross 
database analysis of three main datasets — ‘Black Dog Institute depression dataset (BlackDog), University of 
Pittsburgh depression dataset (Pitt), and Audio/Visual Emotion Challenge depression dataset (AVEC) has 
been done which analysis the three datasets individually as well as by combining them for detection of 
depression features. The dataset was generalized into eye activity data, head pose data, feature fusion data 
and hybrid data. Of all, the eye activity modality showed better performance. The results indicated that if 
variability in training data is more the testing results will be better. In [4], three different methods are 
discussed for emotion recognition. One is use of AU rather than AAM features for classification where AU 
14, proved to be the most accurate AU for depression identification. The second method is by using the 
appearance features from the AAM for classification using SVM and the third is multimodal fusion of vocal 
and video features. This study claims that during clinical interviews of the depressed, the depression 
symptoms are communicated nonverbally and can be detected automatically. Another study for finding out 
depression from facial features has been done by measuring ‘Multi-Scale Entropy’ (MSE) over time period 
on the patient interview video. [5] MSE captures the variations that occur in the video across a single pixel. 
The videos of patients who had lower depression levels were highly expressive of their emotions and such 
videos showed high entropy levels, otherwise the entropy level was low. 

In [6] patients were asked to wear devices to observe their heart-rate, sleep pattern, their reduction in 
social interaction, their GPS location to check if they are skipping work etc. for depression analysis. Data 
collection of depressed patients has also been done in [7] by indicating them film-clips to catch the outward 
appearances of feelings and furthermore by giving an assignment of perceiving negative and positive feelings 
from various facial pictures. In [8], for a video, the face region is first manually initialized and then KLT 
(Kanade-Tomasi-Lucas) tracker is utilized to extract curvature information from the picture. Video based 
approach indicated more precision as it sums up the face area all the more precisely. A technique for face 
recognition with the assistance of Gabor Wavelet has likewise been proposed. [9]. Here recognition of faces 
invariant to Pose and Orientation is done. The features extracted are classified with the help of SVM 
classifier. This framework claims to outperform other face recognition techniques.The work in [10] proposes 
an improved face recognition system which uses Stationary Wavelet Transform for feature extraction and 
Conservative Binary Particle Swarm Optimization for feature selection. The proposed method claims to give 
good performance under cluttered background and is much effective and robust to changes due to 
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illumination, occlusion, and expression. Utilization of landmark points [11] to compute the LBPH of facial 
feature reduces LBP histogram’s dimension, which is used for face detection. Too few landmark points result 
in loss of features. Therefore more landmark points need to be extracted to improve the true positive rate of 
the recognition process. In [12] the eye and eyebrow features are detected with 4 and 3 feature points for each 
eye and eyebrow respectively. One can also divide the eyebrow into three equal parts: the inner, the centre 
and the outer part as in [13]. Feature points can also be detected using different template matching techniques 
[14]. Facial expression recognition can also be done in two phases: manually locating fourteen points in face 
region and create a graph with edges [15] that connect such points and then training artificial neural networks 
to recognize the six basic emotions. The process of facial feature extraction can also be done using Artificial 
Neural Networks Multilayer Perceptron (MLP) with back-propagation algorithm training the ANN with a 
number of examples, called learning set [16] and then assigning weights to make the network capable of 
classifying facial expressions. Features of video and audio information are separated from the video utilizing 
a Movement History Histogram (MHH) which represents to the qualities of minute changes that occur in face 
and vocal appearances of the depressed [17]. Emotion recognition from faces can also be detected using 
Randon and Wavelet transforms. The Randon process projects the 2D image into Randon space and the 
DWT framework extracts the coefficients at the second level decomposition [18]. The fundamental facial 
features chosen are eye, nose and mouth locales that can be separated by applying Haar feature based 
Adaboost algorithm. This strategy diminishes the face preprocessing time for large databases. Facial Activity 
Units are additionally being recognized, where a combination of various facial activity units can form distinct 
complex facial expressions for better investigation [19]. On the off chance that the students' depressed 
feelings are mapped to their actions in classroom, their enthusiastic state can be seen if they are discouraged 
or not, and in light of this the instructor can help the student by giving careful consideration to that specific 
student as in [21]. In the event that diverse faces in a same scene demonstrate a similar positive or negative 
emotion, it would comprehend the entire circumstance of the scene, regardless of whether subjects in the 
scene are upbeat or in the case of something incorrectly is going on in the scene as in [24]. The work in [25] 
proposes a system that identifies depression in college students by finding out the presence of low level of 
happy features in frontal face videos of students. If the happy features are low in the video the student is 
predicted of having depression. In [26] the process of emotion recognition is done based on speech signal 
processing and emotion training recognition. The prosodic parameters from speech signals and the facial 
features fron the video signals are extracted and classified parallelly. Both the classifier results are combined 
using ‘Bimodal’ integration for the final expression recognition result. A face recognition system which 
represents a face using Gabor-HOG features is proposed in [27]. The face image is filtered using a Gabor 
Filter bank. The Gabor magnitude images are obtained and the Histogram of Oriented Gradient is computed 
on these magnitude images. The results show that the fusion of both the methods outperforms the 
performance of both the processes when performed individually. A feature selection algorithm is proposed in 
[28], which uses 2D Gabor wavelet transformation to process only the eye and nose regions of face images 
which shows higher accuracy in detection of multi-pose and multi expression face. Table 1 shows analysis 
tabulated. 


Table 1. Analysis Tabulated 


Papers Depression features extracted Limitations Future scope 

“Social Action Units Interviews in general are less Depression related questionnaires 

Risk...... fs structured. may capture depressive facial 
expressions 

"Cross- Eye movement Training on specific datasets - More varied datasets can be created 

cultural and prevents from generalizing to 

n Head pose movement different observations. 

"Discrim- Unsupervised features- Unsupervised features are used Features can be classified according 

inating Multi Scale Entropy, Dynamical in an exploratory setting. to their discriminatory power 

clinical ...” analysis, Observability features 

"Facial Facial landmarks’ (video) & Non depressed Individuals not Optimizing of features for detection 

geometry... Statistical descriptors (audio) are classified properly of non depressive features. 

i fused. 

"Video- Face region was manually initialized Reinitialized of face region Can consider face as a whole for the 

based...” & then tracked with KLT required if the tracked points are entire video. 


below a threshold. 
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2. RESEARCH METHODOLOGY 

In [1], the most accurate action unit one for depression detection depicted as AU14. Based on this 
theory, the current study proposes a system that will be trained with features of happy, neutral, contempt and 
disgust faces. Then in the testing phase, videos of college students will be collected while they are answering 
different questionnaires. The students’ facial features will be extracted and classified by SVM classifier for 
depression detection. Depression detection will be done by overall presence of happy, neutral, contempt and 
disgust features throughout the video frames and student will be classified as having low, moderate or high 
depression. The architectural diagram of the proposed automated system can be modeled in the following 
way. 


2.1. Proposed Architectural Diagram 
Figure 2 shows architectural diagram for the proposed ‘Depression Detection’ system. 


Student’s Images Captured ; 
Showing Happy, Neutral, Video captured while 


Contempt and Disgust student answers 
Emotion questionnaires 





Facial features extracted SVM Facial features 
from each Input Image using ae extracted from each 
Gabor filter bank TRAINING |_ Classifier) TESTING frame of video 






Depression 
features 
present? 


NO 
Not Depressed YES 


Depressed 
Level of Depression Features 


LOW 
MODERATE 
eee _ HIGH 
Notify Class Notify Department Notify University 
Advisor Counsellor Counselor 


Figure 2. Architectural diagram for the proposed ‘Depression Detection’ system 






2.2. Description of Proposed Architectural Diagram. 
2.2.1. Training Dataset Creation 

In addition to happy, contempt and disgust emotions, the emotion ‘Neutral’ face also implies lack of 
interest, or emotionless face which may be put forth by the depressed.The input is consequently a dataset of 
happy, neutral, contempt and disgust faces. For collecting the input dataset a GUI is created that captures 
images (for each 4 emotions) of the student as Figure 3 below: 





ry cepturingTreiningDataSet = x< 


TRAINING DATASET CREATION 


AAA A 


Capture Happy Face Capture Neutral Face Capture Contempt Face Capture Disgust Face 











Figure 3. Faces of student captured for training set 
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The training dataset created contains 40 images each of Happy, Neutral, Contempt and Disgust 
faces. Finally we have a total of 160 images in the input training dataset. 


2.2.2. Face Detection and Feature Extraction 

Once the training set is created, face from each image is detected using the Viola Jones Face 
Detection algorithm. This algorithm makes use of Haar features, which when convolved throughout the 
image, we get high output values only at those regions that match the pattern of the haar features and then 
using Adaboost algorithm and cascading classifiers, it detects a face as in Figure 4(b). Facial features from 
each face image are extracted using Gabor filters. A Gabor filter bank of 40 filters 1s created using 5 scales 
(2, 3, 3.5, 4 and 5) and 8 orientations (0, 23, 45, 68, 90, 113, 135 and 158.) as in [20]. The Gabor filter bank 
of 40 filters created is shown in Figure 4 (c). For a face detected, the Gabor features extracted are shown in 
Figure 4 (d). 
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(d) 


(a) (b) 


Figure 4. (a) Input image, (b) Face detected, (c) Gabor Filter Bank (d) Gabor Features 


For every image a Gabor feature vector is formed as an ‘n x 1’ column vector as in Figure 5 (a). 
Feature vectors of all the input images found out and combined to form a feature vector of ‘n x 160’ feature 
set as in Figure 5 (b). The dimension of this feature set is very high and so Prinipal Component Analysis 
(PCA) 1s applied to this feature set for dimensionality reduction. Thus we get a ‘160 x 160’ reduced 
dimension feature set after appling PCA as in Figure 5 (c). This ‘Gabor Feature Set’ is the input feature set 
for training. Classes are assigned to each feature vector. The Happy and the Neutral images are considered as 
positive class and hence assigned the value ‘+1’ and the Contempt and Disgust images are considered as 
negative class and hence assigned the value ‘-1’. Finally we get a Gabor Feature Set for training with ‘160 x 
161’ dimension with the 161“ column as the class value as in Figure 5 (d). 


featureVector featureVectorPCA gaborFeatureSetTraining 


+4 196000x 160 double 160x160 double HH 160x161 double 


featureVector 
HH 169000x1 double 





1 
aoe 


1 1 0.3275 0.3438 1 0.0597 1 

2 0.7006 2 0.3339 0.3381 2 0.0024 0.0577 2 0,0597 0.0577 

3 0.7001 3 0.3493 0.3335 3 0.0097 0.0374 3 70.0102 ~0.0088 

4 0.7100 4 0.3864 0.3299 4 0.0023 0.0600 : car ae 

5 0.7536 5 0.4936 0.3301 5 0.0113 0.1591 
(a) (b) (c) ) 


Figure 5. (a) Feature vector for one image; (b) Featue vector set for 160 images; (c) PCA applied feature set; 
(d) Feature set assigned with classes 


2.2.3. Dataset Creation for Testing 

For testing, a GUI is created, where the student 1s given a link to answer a simple online ‘Depression 
Analysis Test’. The system captures the frontal face video of the student, using the system webcam. This 
video is converted into frames and from each frame, the face 1s cropped and the Gabor features are extracted 
in the same way as in the training phase. The Gabor feature vector for all the frames are concatenated to form 
a test feature set as shown in Figure 6 (a). For a sample video of 160 frames the test feature set is as shown in 
Figure 6 (b). 
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BE Variables - testFeatureSet 
testFeatureSet 
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Fo 160«160 double 





DEPRESSION ANALYSIS TEST 


Frame 160 


CONVERT TO FRAMES 


| 2 








1) -1.0891e-04 -0.0109 
CAPTURE VIDEO 2 0.0029 0.2028 
(a) (b) 


Figure 6. (a) GUI for capturing student’s video for testing; (b) Test Feature Set for 160 frames 


2.3. Classification with SVM 

The input feature set is given to a Support Vector Machine classifier for training. The Support 
Vector Machine is a model that splits the two sets in the best possible way. This is the best split because it is 
the widest margin that separates the two groups. This line is called the hyperplane. The nearest points are 
called the Support Vectors. 


w'x+ b=0 (Equation of hyperplane) (1) 
f(x) = ¥, a; y;(x;'x) + b (Equation of function) (2). 
Where, the set >X; are the support vectors. Since w’x+ b=0 and c(w’x+ b) =O define the same 


plane for positive support vectors: w’x, + b= +1 and for negative support vectors: w’x_ + b= -—1. 
Then the margin is given by: 


w wi (x4 —-x_) 2 
— _. (xX, — x_) =o = 3 
lIwI| me ) lIwI| I|wI| o 
To obtain the optimal hyperplane we need to maximize the margin) or we minimize the weight vector 12 


(w.w). Since it becomes a constrained optimization problem this problem can be converted to unconstrained 
optimization problem by using LaGrange multiple. 


= L(w,b) = ~(w.w) — Laj.¥j-(w.x)) — Lay. + Ya; (4) 


Here ‘w’ has to be minimized and bias term ‘b’ has to be maximized. First, we take the derivative of the 
LaGrange with respect to ‘b’ to get: 


OL 
db e e e 
This 1s one of the constrains we have now. Then we take the derivative of LaGrange with respect to w to 
get: 


= ie1 %-Y; = 0 Where, m is the number of feature vector (5) 


OL - a 
—= )i"_a;.y;.x; Where, ‘m’ is number of training samples (6) 
Ob =1“l Vi L g p 


When we substitute the above weight expression with the original expression of the LaGrange: 
Es 1 
24 a) Aj Aj -Vi- Vj (x;.x;) (7) 


Thus the Decision rule depends mainly only on the dot product of the unknown samples Cereeae Given a 
point ‘z’, the decision whether the point belongs to class | or class 2: 


D(z) = sign(y, Oj Vie XpZ + b) (8) 


If the sign is positive then ‘z’ is classified to class ‘1’ 1f negative ‘z’ is classified to class‘-1’. The SVM 
classifier classifies the test data and gives the predicted classes as shown in Figure 7 below: 
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2.3.1. Depression Level Identification 


Figure 7. Predicted classes for the 160 test frames 
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For identifying the level of deression from the video we need to find out the total amount of 
negative emotions in the video frames. The student’s emotion level may change within the time duration of 
the video. The video is therefore divided into three parts of equal time duration. As depicted in the Table 2: 

e =-If all the three parts of the video have more of positive emotions — the student can be classified as having 


‘No Depresseion’. 


e If first two parts of the video show positive emotion and the third part show negative emotion, the video 
is classified as “Low Depression’ since only end part of the video is showing negative emotion. 
e If two parts of the video is showing positive emotion, then the student may be suffering from ‘Mild 


Depression’, as most parts of the video show positive emotion. 


e If out of the three parts, two parts of the video show negative emotions, then the student is mostly 


showing negative expressions so predicted as having ‘High Depression’. 


Time Duration 


Features Present 

(Happy, Neutral — Positive 
class- ‘Positive’ 

Contempt and Disgust — 
Negative class — ‘Negative’) 


3. EXPERIMENTAL RESULTS AND ANALYSIS 


Table 2. Depression Level Identification Table 


First Part of Video Middle Part of Video 


Positive 
Positive 


Positive 
Positive 
Negative 
Negative 
Negative 
Negative 


Positive 
Positive 
Negative 
Negative 
Positive 
Positive 
Negative 
Negative 


Last Part of Video 


Positive 
Negative 
Positive 
Negative 
Positive 
Negative 
Positive 


Negative 


Depression Level 


No Depression 
Low Depression 


Mild Depression 
High Depression 


Mild Depression 
High Depression 
High Depression 


High Depression 


Here videos five different students were taken for experimental analysis. For a single video, each 
frame of the video was analysed manually and, based on the emotion present they were assigned as having 
positive ‘+1’ or negative ‘-1’ emotion. These are thus the actual classes of the test video frames. The 
classifier predicted each frame to belong to either positive or negative class. 


Emotion 


Negative (Predicted) 
Positive (Predicted) 
Total 


Negative (Actual) 


65 
Al 
106 


Table 3. Confusion Matrix for video I 
Positive (Actual) 


16 


38 
54 


Table 3 represents the confusion matrix of actual and predicted classes for the test video frames. 
Overall 160 images of the test video where considered. 65 video frames where correctly classified as having 
negative emotion and the remaining 16 frames incorrectly classified as positive class. For the positive 
emotion frames, out of the 79, 38 were correctly classified as positive and the remaining 41 were wrongly 
classified. For this particular video the classifier worked with an accuracy of 64.38%. Table 4 shows 
performance metrics of the system for a sample video. 
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Table 4. Performance metrics of the system for a sample video 


Performance Value (%) 
Accuracy 64.38 
Error 35.62 
Sensitivity 61.32 
Specificity 70.37 
Precision 80.25 
FalsePositiveRate 29.63 
Fl_score 69.52 


As shown in Table 5, five videos were considered for testing. For all the five videos first 160 frames 
were considered for testing. The video was then divided into three equal parts. The sum of the positive and 
the negative emotion for each part was found out. If positive emotions were more compared to negative 
emotions then that part of the video was labeled as ‘Positive’. The “Actual Emotion State’ of the videos was 
found out by calculating the amount of the positive and the negative emotions present in the actual class. 
Similarly, the ‘Predicted Emotion State’ of the videos was found out by calculating the amount of the 
positive and the negative emotions present in the predicted class. 


Table 5. Identifying Level of Depression 


Vid Actual Actual Actual Predi Predi Accura __ First Part Second Third Predicted 
eo —ve +ve Emotion cted — cted cy (%) Emotion Part Part Emotion 

State ve +ve Emotion Emotion State 

I 106 54 High 81 79 64.38 Negative Positive Negative High 
Depression Depression 

II 83 77 Mild 72 88 51.88 Positive Negative Positive Mild 
Depression Depression 

III 4] 119 Not 84 76 55.63 Negative Negative Positive High 
Depressed Depression 

IV 88 72 Mild 91 69 54.38 Positive Negative Negative High 
Depression Depression 

Vv 101 59 High 79 81 42.50 Positive Positive Negaive Low 
Depression Depression 


Out of the five videos, Videos — I and II showed same Actual and Predicted emotional state. One of 
the videos, Video - IV with Mild Depression was predicted as ‘High Depression’. For the remaining two 
videos, Videos —II and V, the Actual and the Predicted Emotional state were contradicting. The system works 
with a maximum accuracy of 64.38%. If the student is predicted as having ‘Low Depression’, the ‘Class 
Advisor’ is sent a notification mail indication the mental state of the student. If the student has ‘Mild 
Depression’, the ‘Class Advisor’ and the ‘Department Counsellor’ are notified. If the student has got ‘High 
Depression’ along with the ‘Class Advisor’, the “Department Counsellor’ and the ‘University counsellor’ are 
also informed about the student’s disturbed mental state. 

The analysis of this work depicts that, using algorithm proposed in the current study, the presence of 
depression features can be effectively found out even for a small durarion of video. This process can in turn 
be applied for a video of any large duration and depression features can be identified effectively. This works 
proves that if the system is trained effectively with the images of depression features alone, the identification 
of depression in videos can be successfully done with video modality alone. Many of the previous works 
dealt with identification of all the basic six human emotions, but here only the identification of four main 
emotions - happy, contempt, disgust and neutral are considered which are mainly found in depressed as in 
[1]. This in trun reduces the training and testing overload and improves the classifier performance. In this 
work, the main focus was to find out depression in students, who are not formerly diagnosed with depression. 
This system does not make use of any standard emotion recognition databases for training. Instead it captures 
the student face emotions itself for training the classifier. The testing video is taken captured at the same 
time, with the same camera under the same background conditions. This helps classifier to efficiently identify 
emotions from the video of the same person whose images were taken prior for training the classifier. 
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4. CONCLUSION AND FUTURE WORK 

This study was undertaken for finding out the level of depression in five different videos of college 
students. The presence of ‘Happy’,’ Neutral, - (positive emotion) and ’Contempt’ and ‘Digust’-(Negative 
emotion) facial features, which are found prominent in depression videos were found out and analysed. The 
dataset for training and testing was captured separately and the facial features of the same were classified 
using a Support Vector Machine classifier. The amount of the positive and negative emotions in each video 
was analysied and the videos were predicted as videos with ‘High Depression’, ‘Mild Depression’ or ‘Low 
Depression’. The classifier predicted the outcomes with a maximum accuracy of 64.38% accuracy. 

The more the number of training samples, the more accurate will be the classifier prediction. The 
testing videos captured are of more than thousand frames, out of which only the first 160 frames were 
considered here for testing purpose. This process can be done for the entire video, by finding out the key 
frames of the video, by using a key frame extraction technique in the future work. The current study deals 
only with the recent videos of the student. However, for more accurate depression detection, the history of 
the student should also to be taken into consideration. Therefore, in the future work, more videos of the same 
student, taken at different time duration can be considered. This may help to analyse and compare the past 
and the present mental state of the student and provide more information to the process of depression level 
identification. 

Depression detection from videos alone forms only a part of the whole process of identifying 
depression. Those students, who may be classified as not depressed, may be victims to depression in the 
future. For this reason, their other activities have to be continuously monitored. This includes the continuous 
monitoring of their academic activities, their extra curricular activities and also their social activities. 
Monitoring academic activities include monitoring the student’s grades and attendance. Decreasing in grades 
or attendance may also be due to a student’s extra curricular activites, like engaging in sports or arts. If a 
student’s grades or attendance are poor and they are not active in other mediums like arts or sports also, then 
they may be at a high risk of falling into depression. Hence students’ extra curricular activities have also to 
be continuously monitored for indentification of depression. In addition to this, there should also be a way of 
monitoring a student’s social media content because if the students’ social media content show a negative 
attitude towards life, then such a student may be a victim of stress and depression. Furthermore, for students 
residing in hostels, input from the hostel authorities regarding the activities of a student within the hostel 
should also be considered for monitoring the students’ day to day activities. If the student leaves the hostel 
premises to college, but in turn skips classes by indulging in other negative activities, then there is a risk of 
the student falling into a negative state of mindset which may eventually lead to depression. The future work 
to this study is to form an elaborate model of depression identification process, by taking all the above 
mentioned factors into consideration and combining it with the current work of identifying depression with 
images. 
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